Skip to content

MDEV-38936 Proactive handling of InnoDB tablespace full condition#4721

Open
FarihaIS wants to merge 1 commit into
MariaDB:mainfrom
FarihaIS:mdev-38936
Open

MDEV-38936 Proactive handling of InnoDB tablespace full condition#4721
FarihaIS wants to merge 1 commit into
MariaDB:mainfrom
FarihaIS:mdev-38936

Conversation

@FarihaIS

@FarihaIS FarihaIS commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

Description

InnoDB write failures occur when tablespace files exceed filesystem size limits (e.g. 16TB on ext4, 2TB on ext3 - varies by filesystem). Current behavior logs errors but continues accepting transactions, causing repeated failures, user disruption, and potential data integrity issues.

Add proactive monitoring by emitting warnings when InnoDB tablespaces approach a configurable size threshold.

Key features:

  • Two new system variables:
    • innodb_tablespace_size_warning_threshold (default 0, disabled): Maximum tablespace size in bytes before warnings begin
    • innodb_tablespace_size_warning_pct (default 85%): Percentage of threshold at which to start emitting warnings
  • Warning frequency:
    • Below warning_pct: No warnings
    • At or above warning_pct: Every 1% increase (85%, 86%, 87%, etc.)
  • Per-tablespace tracking with automatic reset on TRUNCATE/DROP or threshold/percentage changes
  • Zero overhead when threshold is 0
  • Progressive warnings capped at 100%

Implementation adds fil_space_t::extend() which consolidates file extension, size_in_header update, and size warning checks. Per-tablespace warning state is tracked in fil_space_t (m_last_size_warning_pct, m_last_warning_threshold, m_last_warning_pct).

Release Notes

Added proactive InnoDB tablespace size monitoring to prevent filesystem size limit failures. Two new system variables enable configurable warning thresholds with incremental warning frequency:

  • innodb_tablespace_size_warning_threshold (default 0, disabled): Maximum size before warnings
  • innodb_tablespace_size_warning_pct (default 85%): When to start warnings

Warning frequency:

  • Below configured percentage: no warnings
  • At or above configured percentage: every 1% increase
  • Threshold set to 0: warnings disabled

How can this PR be tested?

Execute the innodb.tablespace_size_warning test in mysql-test-run. This commit adds a test in the innodb suite.

The test validates:

  1. Both system variables are visible and have correct default values
  2. Basic warning emission when tablespace exceeds configured percentage
  3. Configurable warning percentage (tests both 70% and 80% thresholds)
  4. Threshold set to 0 disables warnings, re-enabling with a nonzero threshold resumes them
  5. TRUNCATE TABLE resets warning state
  6. Behavior when tablespace exceeds 100% of threshold (warnings cap at 100%)

Expected warning behavior in error log:

  • Below innodb_tablespace_size_warning_pct (default 85%): No warnings

  • At or above innodb_tablespace_size_warning_pct: Every 1% increase

    Example: [Warning] InnoDB: Tablespace 'test/t1' size 7340032 bytes reached 70% of configured threshold of 10485760 bytes

Basing the PR against the correct MariaDB version

  • This is a new feature, and the PR is based against the main branch.

Copyright

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

@CLAassistant

CLAassistant commented Mar 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@grooverdan grooverdan added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 2, 2026
@mikegriffin

Copy link
Copy Markdown
Contributor

Feature request, allow additional use cases (example, tablespace becomes larger than expected):

  • Make a configurable warning percent (tablespace_size_warning_pct) so warnings can start earlier without changing the byte threshold
  • Replace hard-coded 90 with a named constant (high_resolution_pct)
  • No change above high_resolution_pct: print on every 1% increase
  • Between tablespace_size_warning_pct and high_resolution_pct: print at most twice per 10% (example, 70%, 77%, 81%, 89%, 90%, 91%, 92%)

@FarihaIS FarihaIS marked this pull request as ready for review March 2, 2026 23:10
@Thirunarayanan Thirunarayanan requested review from dr-m and iMineLink March 3, 2026 05:01
@FarihaIS FarihaIS force-pushed the mdev-38936 branch 3 times, most recently from 64ab2ed to 5bd3d38 Compare March 3, 2026 17:30
@FarihaIS

FarihaIS commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

@mikegriffin I have just pushed some new changes. Could you please take a look and confirm whether the new implementation addresses the additional use cases you mentioned above? Thank you!

@iMineLink iMineLink left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!
I left a few comments on the feature.
Since it's only adding logs and not solving actual bugs related to excessive InnoDB tablespace size (like the recently discovered MDEV-38898), please also wait for @dr-m comments.
Nevertheless, it's fair to say that the feature, when disabled, seems to have a small runtime cost (checking a variable in an ATTRIBUTE_COLD function, new members of fil_space_t, whose footprint may be further reduced by reordering to avoid padding, or eliminated by storing only high 32 bits of threshold + reorder).

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/include/fil0fil.h
Comment thread mysql-test/suite/innodb/t/tablespace_size_warning.test
Comment thread storage/innobase/handler/ha_innodb.cc
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc
Comment thread mysql-test/suite/innodb/r/tablespace_size_warning.result Outdated
@FarihaIS FarihaIS force-pushed the mdev-38936 branch 3 times, most recently from 5a0d8ee to 7c0e2a0 Compare March 13, 2026 18:30
@FarihaIS

Copy link
Copy Markdown
Contributor Author

@iMineLink Thank you for the detailed review! I have addressed all your comments and updated the PR description to reflect the latest version of the feature. Please let me know if I have missed anything, thank you.

I will wait for @dr-m's review in the meantime.

@FarihaIS FarihaIS requested a review from iMineLink March 13, 2026 21:28

@iMineLink iMineLink left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the previous review points. I have just a couple more points, then it's good for me.

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment thread mysql-test/suite/innodb/t/tablespace_size_warning.test
@FarihaIS

Copy link
Copy Markdown
Contributor Author

@iMineLink thank you for the suggestions again, I've addressed all the new comments as well!

Please let me know if you have any other thoughts while we wait for @dr-m's review.

@FarihaIS FarihaIS requested a review from iMineLink March 17, 2026 21:15

@iMineLink iMineLink left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @FarihaIS, the changes look good to me!

As a note, the feature in the current state will be enabled by default.

Please wait for @dr-m review, thanks.

@dr-m dr-m left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this good for? Do you have an example of already implemented external monitoring that would react when some warning messages appear in the server error log?

Could we have something that would better integrate with event handlers and other existing mechanisms?

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/srv/srv0srv.cc Outdated

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review. LGTM. Please keep working with Marko on his review.

@FarihaIS FarihaIS force-pushed the mdev-38936 branch 2 times, most recently from 59cedaa to 1f4e0c0 Compare April 9, 2026 19:35
@FarihaIS

FarihaIS commented Apr 9, 2026

Copy link
Copy Markdown
Contributor Author

@dr-m thank you for your feedback. I have addressed the two code changes you requested above now. Please let me know if these changes look okay or if they need further modification.

As for the questions you asked above,

What is this good for? Do you have an example of already implemented external monitoring that would react when some warning messages appear in the server error log?

These warnings would be helpful for external monitoring tools, for example, AWS RDS, which monitors the error log for operational alerts. This follows the same pattern as existing InnoDB warnings (undo truncation, system tablespace full, etc.).

Could we have something that would better integrate with event handlers and other existing mechanisms?

Could you please help guide me to the kind of integration you're looking for? I'm not entirely sure what the new approach would look like, but I'm happy to make the changes once I have a clearer understanding.

@FarihaIS FarihaIS requested review from dr-m and iMineLink April 9, 2026 23:44
@grooverdan

Copy link
Copy Markdown
Member

Could we have something that would better integrate with event handlers and other existing mechanisms?

Could you please help guide me to the kind of integration you're looking for? I'm not entirely sure what the new approach would look like, but I'm happy to make the changes once I have a clearer understanding.

I think @dr-m is after best practices in log message in general and tooling integration. So perhaps MDEV-27147 JSON Error log to STDERR/STDOUT as an option, and perhaps - https://opentelemetry.io/docs/specs/otel/logs/data-model/#events

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment thread storage/innobase/include/fil0fil.h Outdated
Comment thread storage/innobase/include/fil0fil.h Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc
Comment thread storage/innobase/fsp/fsp0fsp.cc
Comment thread storage/innobase/handler/ha_innodb.cc Outdated
@FarihaIS

Copy link
Copy Markdown
Contributor Author

Could we have something that would better integrate with event handlers and other existing mechanisms?

Could you please help guide me to the kind of integration you're looking for? I'm not entirely sure what the new approach would look like, but I'm happy to make the changes once I have a clearer understanding.

I think @dr-m is after best practices in log message in general and tooling integration. So perhaps MDEV-27147 JSON Error log to STDERR/STDOUT as an option, and perhaps - https://opentelemetry.io/docs/specs/otel/logs/data-model/#events

@grooverdan Thanks for the pointers! Since this uses sql_print_warning(), it would automatically benefit from structured output once MDEV-27147 (JSON error log) lands, right? Is there any special handling needed on our side? What about for OpenTelemetry integration as well?

@FarihaIS

FarihaIS commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@dr-m thank you for the detailed feedback! I've addressed/responded to all your comments above - could you please take a look and see if there are any other changes needed?

@dr-m dr-m left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think that as part of this, we must pay back some maintenance debt of the fil_space_extend() function.

Comment thread mysql-test/suite/innodb/r/tablespace_size_warning.result Outdated
Comment thread mysql-test/suite/innodb/t/tablespace_size_warning.test Outdated
Comment thread mysql-test/suite/innodb/t/tablespace_size_warning.test Outdated
Comment thread storage/innobase/include/fil0fil.h Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
@FarihaIS

FarihaIS commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@dr-m thank you for the detailed feedback again! I've addressed all your comments above - could you please take a look and see if there are any other changes needed?

@dr-m dr-m left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current base d755574 is 555 commits behind the current target branch, or 66 commits if merges are counted as a single commit. This could explain a few compilation and test failures.

Can you please rebase this to the latest main branch?

This should be OK to push after addressing my review comments.

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +653 to +656
const uint warning_pct= fil_system.tablespace_size_warning_pct;

if (threshold == 0)
return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re unnecessarily reading warning_pct if threshold==0. Actually, I see that my GCC 16.1.0 is reordering the load of warning_pct after the early return false. We could explicitly do that in the source code, to benefit less aggressively optimizing compilers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved the reading of warning_pct to after the check for threshold == 0

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated

/* Reset state if threshold or warning percentage changed */
const uint32_t threshold_pages=
static_cast<uint32_t>(threshold / physical_size());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor-style narrowing cast uint32_t(threshold / physical_size()) would be less of an eye-sore, also for the uint8_t and uint64_t casts in this function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - changed to constructor-style casting for all 4 occurrences

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +661 to +662
if (m_last_warning_threshold != threshold_pages ||
m_last_warning_pct != warning_pct) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following would lead to shorter code with fewer conditional branches:

  if ((m_last_warning_threshold ^ threshold_pages) |
      (uint{m_last_warning_pct} ^ warning_pct)) {

I was astonished by the size difference this made on the compiler output: from over 900 bytes of x86-64-v3 code to a bit over 630. But, I did not check if there were any changes to function inlining.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's incredible, I've changed the if statement to match the XOR one above, thank you!

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +670 to +671
/* new_size is at most 2^32 and physical_size() at most 2^16,
so current_bytes * 100 < 2^55, well within uint64_t range. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d use the C notation in C-style comments: 2^32 is 0x22 (34) to me. Let us write 1<<32 and so on, or 1ULL<<32 if we are pedantic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, changed the comments to use C-style notation!

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +680 to +682
/* Warn on every 1% increase */
if (display_pct <= m_last_size_warning_pct)
return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment would be clearer if it were placed after the if block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, moved to after the if block

Comment on lines +706 to +707
if (!fil_space_extend(this, size))
return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to point out that fil_space_extend() should have been declared static, but I now see that xtrabackup_apply_delta() is invoking that function. This seems to be the simplest solution after all.

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +720 to +721
mtr->write<4,mtr_t::FORCED>(*header, FSP_HEADER_OFFSET + FSP_SIZE
+ header->page.frame, size_in_header);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the old TAB based InnoDB formatting style we split files before a binary operator such as +. Here, the + should be placed at the end of the first line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, thanks for catching that - moved the + to the end of the first line!

Comment thread storage/innobase/fsp/fsp0fsp.cc Outdated
Comment on lines +647 to +652
/** Check if tablespace size exceeds warning threshold and emit warning.
@param new_size New size in pages
@return true if warning was emitted */
bool fil_space_t::check_size_warning(uint32_t new_size) noexcept
{
const ulonglong threshold= fil_system.tablespace_size_warning_threshold;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function must be declared and defined with the inline attribute, because there is only a single caller, in a likely code path.

Even better could be to declare and define this function as ATTRIBUTE_COLD and make it take a nonzero fil_system.tablespace_size_warning_threshold as a parameter. The only caller would check it:

  if (uint64_t threshold= fil_system.tablespace_size_warning_threshold)
    return check_size_warning(new_size, threshold)

In this way, the impact on the code cache should be minimized in deployments where this feature is disabled by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all the changes you suggested above, thank you

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FarihaIS I will verify the changes resulting from the last comment by @dr-m. Withdrawing my approval because of that. Please re-request my review when done with these.
FYI, I've also rebased the branch to the latest trunk.

@gkodinov gkodinov removed the request for review from iMineLink June 10, 2026 09:42
InnoDB write failures occur when tablespace files exceed filesystem size
limits. Current behavior logs errors but continues accepting
transactions, causing repeated failures and potential data integrity
issues.

Add proactive monitoring by emitting warnings when InnoDB tablespaces
approach a configurable size threshold.

Key features:
- Two new system variables:
  * innodb_tablespace_size_warning_threshold (default 0, disabled):
    Maximum tablespace size in bytes before warnings begin
  * innodb_tablespace_size_warning_pct (default 85%): Percentage of
    threshold at which to start emitting warnings
- Warning frequency:
  * Below warning_pct: No warnings
  * At or above warning_pct: Every 1% increase (85%, 86%, 87%, etc.)
- Per-tablespace tracking with automatic reset on TRUNCATE/DROP or
  threshold/percentage changes
- Zero overhead when threshold is 0
- Progressive warnings capped at 100%

Implementation adds fil_space_t::extend() which consolidates file
extension, size_in_header update, and size warning checks.
Per-tablespace warning state is tracked in fil_space_t
(m_last_size_warning_pct, m_last_warning_threshold, m_last_warning_pct).

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
@FarihaIS

Copy link
Copy Markdown
Contributor Author

@gkodinov thank you for rebasing the branch! I have made all the requested changes from @dr-m's latest review - could you please help confirm if the PR is good to go, or if there are any other changes needed? Thank you

@FarihaIS FarihaIS requested a review from gkodinov June 10, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

7 participants